Abstract: Big data is large volume, heterogeneous, decentralized distributed data with different dimensions. In Big data applications data collection has grown continuously, due to this it is difficult to manage, capture or extract and process data using existing software tools. Performing data analysis is becoming expensive with large volume of data in data warehouse. Data privacy is one of the challenge in data mining with big data. To preserving the privacy of the user we need to use some method so that data privacy is preserve and at the same time increase the data utility. In existing centralized algorithms it assumes that the all data should be at centralized location for anonymization which is not possible for large scale dataset,and there was distributed algorithms which mainly focus on privacy preservation of large dataset rather than the scalability issue. In the proposed system we focus to maintain the privacy for distributed data, and also overcome the problems of M-privacy and secrecy approach with new anonymization and slicing technique. Our main goal is to publish an Genuine or Anonymized view of integrated data, which will be immune to attacks. We use MR-Cube approach which addresses the challenges of large scale cube computation with holistic measure.Slicing contains tuple partition, generalization, slicing and anonymization. Once slicing is done the anonymized data can freely access by user with more data availability.
Keywords: Big Data, Hadoop, Map-Reduce, HDFS, MR-Cube, Data Security, slicing.